Assessing the practicality of using a single knowledge‐based planning model for multiple linac vendors

Abstract Purpose Knowledge‐based planning (KBP) has been shown to be an effective tool in quality control for intensity‐modulated radiation therapy treatment planning and generating high‐quality plans. Previous studies have evaluated its ability to create consistent plans across institutions and between planners within the same institution as well as its use as teaching tool for inexperienced planners. This study evaluates whether planning quality is consistent when using a KBP model to plan across different treatment machines. Materials and methods This study used a RapidPlan model (Varian Medical Systems) provided by the vendor, to which we added additional planning objectives, maximum dose limits, and planning structures, such that a clinically acceptable plan is achieved in a single optimization. This model was used to generate and optimize volumetric‐modulated arc therapy plans for a cohort of 50 patients treated for head‐neck cancer. Plans were generated using the following treatment machines: Varian 2100, Elekta Versa HD, and Varian Halcyon. A noninferiority testing methodology was used to evaluate the hypothesis that normal and target metrics in our autoplans were no worse than a set of clinically‐acceptable baseline plans by a margin of 1.8 Gy or 3% dose‐volume. The quality of these plans were also compared through the use of common clinical dose‐volume histogram criteria. Results The Versa HD met our noninferiority criteria for 23 of 34 normal and target metrics; while the Halcyon and Varian 2100 machines met our criteria for 24 of 34 and 26 of 34 metrics, respectively. The experimental plans tended to have less volume coverage for prescription dose planning target volume and larger hotspot volumes. However, comparable plans were generated across different treatment machines. Conclusions These results support the use of a head‐neck RapidPlan models in centralized planning workflows that support clinics with different linac models/vendors, although some fine‐tuning for targets may be necessary.


INTRODUCTION
Intensity-modulated radiation therapy treatment planning is a challenging and time-consuming process that can vary in quality between planners across institutions 1,2 and even within the same institution. 3 Knowledge-based planning (KBP) has been shown to be effective in creating high-quality treatment plans, [4][5][6][7][8][9] reducing variability between planners, 9 and evaluating plan consistency and quality. 2,3,[11][12][13][14] These factors make KBP models ideal for centralized planning systems, such as the radiation planning assistant (RPA). The RPA is a web-based automated treatment planning system that is being developed to provide highquality contours and treatment planning to clinics with limited resources around the world. The RPA's development has been described in previous works. 5,[15][16][17][18][19][20][21][22] Through the Eclipse application programming interface, the RPA uses RapidPlan (Varian Medical Systems, Palo Alto, CA), a commercial KBP system, to generate and optimize treatment plans for various cancer sites.
Since we aim to provide plans generated by the RPA to multiple clinics around the world, we needed to evaluate how plan quality is affected when a single RapidPlan model is used to generate plans with different treatment machines that have different beam qualities and multileaf collimator (MLC) characteristics. In this study, we started with a KBP approach that was developed for head and neck treatment on a Varian 2100 series linac, with plan quality validated by radiation oncologists from multiple institutions. We then used the same approach to create volumetric-modulated arc therapy (VMAT) plans for treatment on Versa HD (Elekta) and Halcyon (Varian) treatment devices.The plans were then dosimetrically evaluated for quality by comparing them to our baseline, physician-reviewed autoplans. Previous studies have shown the robustness of KBP models to tumor location, treatment modalities, and institutional protocols, 23,24 but to our knowledge, this is the first study to evaluate a RapidPlan model across different treatment machines. This comparison is important for centers (or treatment planning services) where there is a need to accommodate different models and vendors.

Patient data
For this analysis, a cohort of 50 patients with head and neck cancer was retrospectively collected and deidentified. This study was approved by the institutional review board. All patients were previously treated using VMAT. The patients' original, physician-drawn targets, and normal tissue contours, along with the original CT scan and dose prescription, were used in autoplan generation. The patients had primary tumors from various

Plan optimization
The RPA uses a RapidPlan model to optimize HN plans in Eclipse Treatment Planning System (Varian Medical Systems, Palo Alto, CA). This model was developed using the vendor-provided Washington University HN RapidPlan model as a starting point. Additional planning objectives, maximum dose limits, and additional planning structures were added to the model. The model was refined through iterative testing and physician feedback using a set of validation patients that were not included in this study. The full planning strategy is outlined in a previous publication by our group. 5 The model was optimized for a Varian 2100 linac.

Baseline plans
In a previous study by our group, a set of Varian 2100 plans was generated for our patient cohort using Eclipse version 15.5-in which the autoplans were compared to the original, clinically-approved plan in a blinded review by physicians-and scored for clinical acceptability based on contour quality and dose coverage. 5 From this study, 49/50 of the autoplans were deemed clinically acceptable by the reviewers. These physician-reviewed autoplans were dubbed our baseline plans and were used as a basis for our comparison.

Plan generation
Since the creation of our baseline plans, we upgraded Eclipse to version 15.6. Three plans were generated per patient in our cohort using the following machines: Varian 2100, Elekta Versa HD, and Varian Halcyon. We decided to re-plan the Varian 2100 cases in Eclipse v15.6 in order to verify that any dose differences seen were not the result of the difference in Eclipse versions. The plans generated in Eclipse v15.6 were dubbed our experimental plans.
The Varian 2100 (both baseline and experimental) and Versa HD plans consisted of three 360 • coplanar treatment arcs with collimator angles of 15 • , 345 • , and 90 • . The Halcyon plans consisted of four treatment arcs with collimator angles of 0 • , 45 • , 90 • , and 315 • . Three of the 50 patients had two planning target volume (PTV) dose levels, while the other 47 had three PTV dose levels.

Evaluation process
To determine the quality of our experimental plans, we used a one-sided Mann-Whitney U test at 95% confidence to determine whether the experimental doses were noninferior to the baseline doses by a margin, δ, we determined based on clinical judgement. 25 For metrics in which higher values are better, we have the following hypothesis: Experimental plans are considered noninferior if the null hypothesis is rejected (i.e., p < 0.05). A 95% confidence interval for M E -M B was calculated in order to conclude noninferiority. For DVH metrics in which lower values were better, the upper limit of the 95% confidence interval needed to be less than the margin in order to conclude noninferiority. For metrics in which higher values were better, the lower limit of the confidence interval needed to be greater than the margin. For normal structures, D95%, and D98%, δ = 1.8 Gy (i.e., 3% of the lowest dose prescription in our dataset); for V95%,V100%,V105%,and V110%,δ = 3%.The number of plans that met established clinically accepted dosimetric criteria outlined in Radiation Therapy Oncology Group protocol 1016 28 was also calculated and compared in order to assess the clinical acceptability of the autoplans. Table 2 shows the confidence interval and p-values from our statistical test for each DVH criteria. The Versa HD was noninferior (p < 0.05) for 23 of 34 DVH metrics evaluated; Halcyon was noninferior for 24 of 34 metrics; the Varian 2100 re-plan was noninferior for 26 of 34 metrics. We were not able to conclude noninferiority for brain, brainstem, both cochleae, ipsilateral parotid, brainstem with 5 mm margins, and spinal cord with 5mm margins. For some DVH metrics, we were able to conclude noninferiority for some experimental machines (Versa, Halcyon, 2100 v15.6), but not all. This can be seen in intermediate-dose V100%, V105%, and lowdose V105% PTVs, where the p-values for Versa and Halcyon were above 0.05, but for 2100 v15.6, it was less than 0.05. Figure 1 shows the distribution of planned dose to the normal structures for the brainstem, right cochlea, ipsilateral parotid, and low-dose PTV target coverage at 100%, 105%, and 110% of the prescription dose. The distributions for the brainstem, right cochlea, ipsilateral parotid, and low-dose PTV target coverage at 105% and 110% show that there was very good agreement between the experimental and baseline plans, with r-squared values greater than 0.82 for all experimental machines. However, there was not very good agreement (R 2 < 0.50) for low-dose target coverage at 100%, due to an outlier. Figure 2 shows a DVH for one of the patients, comparing brainstem, both parotids, and PTVs for the baseline and experimental autoplans. All four plans showed good agreement for those structures.

DISCUSSION
The goal of this study was to evaluate the ability of a single RapidPlan model to generate comparable plans across different treatment machines that have different beam and MLC characteristics. Using a RapidPlan model that was optimized for a Varian 2100 linac, we were able to generate plans that were noninferior to our baseline in at least 68% of the DVH metrics evaluated for each set of experimental plans. For the metrics in which we were not able to conclude noninferiority, we conducted an investigation into the reason why the noninferiority criteria was not met. We noticed that there were outliers for the Halcyon and Varian 2100 (v.15.6) machines in the following normal structures: brain, brainstem, right cochlea, left cochlea, and brainstem (with 5-mm margins). These outliers came from the same two TA B L E 2 95% confidence interval (CI) and p-value of the one-sided Mann-Whitney test  patients, one of them being the patient in which the clinically unacceptable baseline plan was generated for in the aforementioned previous study. 5 Both of these patients were planned with two PTVs. The Halcyon and Varian 2100 plans for these patients had large dose gradients around the cochleae, and one had a target close to the brain, which led to a large dose gradient around the brain/brainstem. We also noticed that, for these two patients, the Halcyon and Varian 2100 (v.15.6) plans were more comparable to the original plans used treat in the clinic than to our baseline autoplans. The experimental Varian 2100 plans that were generated in Eclipse v15.6 were noninferior in the most DVH criteria. Overall, these plans were noninferior for all 17 DVH criteria for target structures.
The target volumes in our study were less homogeneous for the Versa HD and Halcyon plans than for the baseline plans. In general, these experimental plans had  reduced coverage of the prescription dose (V95% and V100%) and larger hotspots (V105% and V110%) at all dose levels. This indicates that some additional finetuning of the plans may be needed when applying the standard RapidPlan approach to different machines with different beam qualities and MLC characteristics. This sort of fine-tuning, however, has been shown to require minimal effort. 29 This study has some limitations. The sample size for this study is a limiting factor of our test. This is apparent when looking at the mean dose to the parotids. The treatment machines that we were not able to confirm noninferiority for the parotids each had only one plan with a dose above the 1.8 Gy margin. Additionally, the parotid dose in these plans were less than 1 Gy above the margin. Prior to reproducing this study, we would need to increase the power of our tests by increasing our sample size. While the baseline plans were determined to be clinically acceptable, we are not able to conclude that the same for the experimental plans generated for this study. Furthermore, an inherent weakness with this study is that we can only confirm noninferiority for the model under investigation. However, we can confirm that it is possible to create noninferior plans with a KBP model across different treatment machines. We have also established a methodology for evaluating other KBP models across different treatment machines.
Evaluating the effects of applying a single Rapid-Plan model to linacs developed by various vendors is important as more automation tools, such as the RPA, are being developed that will allow for centralized planning. It will be important to understand the strengths and weakness of these models as they will be used on multiple linac models and vendors around the world. We have shown in this study there is very good agreement between plans generated with different linac types, although some fine-tuning of models may be needed to improve target coverage and minimize hotspots.

AC K N OW L E D G E M E N T S
The Radiation Planning Assistant project is partially funded by the National Cancer Institute (NIH CA-202665) and Varian Medical Systems.

C O N F L I C T O F I N T E R E S T
The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.